Speech recognition for a distant moving speaker based on HMM composition and separation
نویسندگان
چکیده
This paper describes a hands-free speech recognition method based on HMM composition and separation for speech contaminated not only by additive noise but also by an acoustic transfer function. The method re alizes an improved user interface such that a user is not encumbered by microphone equipment in noisy and re verberant environments. In this approach, an attempt is made to model acoustic transfer functions by meωlS of an ergodic H:rvlM [1]. The states of this HM:tvl cor respond to different positions of the sound source. It can represent the positions of the sound sources, even if the speaker moves. The H:rvl]\! l parameters of the acoustic transfer function are estimated by HMJ\.I sep aration [2]. The method is obtained thro喝h the reverse of the process of HMM composition, where the model parameters are estimated by maximizing the likelihood of adaptation data uttered frol11 an unknown posit.ion. Therefore, measurement of impulse responses is not re quired. In this paper, we record the speech of a distant mov ing speaker in reaJ environments. The results of ex periments for the speech of a distant moving speaker clarified the effectiveness of HMM composition and sep aratíon. In hands-free speech recognition, one of the key issues as regards practical use is the developm't�nt of a tech nology that allows accurate recognition of noisy and re verberant speech. Many methods have been presented for solving problems caused by additíve noise and con volutionaJ distortion in robust speech recognition. Two common examples of such methods are the speech en hancement and model compensation approaches. For the speech enhancement approach, spectraJ subtrac tion for additive noise and cepstral mean normalization for convolutional distortion have been proposed (e.g., [3,4]). For the model compensation approach, the con・ ventionaJ multi-template technique, model adaptation (eι [5, 6 ]) and model (decomposition n methods (eι [1, 7, 8, 9, 10]) have been developed We applied HMM composition to the recognition of speech contaminated not only by additive noise but also by the reverberation of the room [1]. \Ve aJso proposed HMM separation for estimating the HM乱1 parameters of an acoustic transfer function [2]. The model parameters are estimated by maximizing the likelihood of adaptation data uttered from an unknown position. This paper desc1'ibes the performance of the HMM composition and separation for recognition of the speech of a distant moving speaker. The speech of the distant moving speaker is recognized by …
منابع مشابه
HMM-separation-based speech recognition for a distant moving speaker
This paper presents a hands-free speech recognition method based on HMM composition and separation for speech contaminated not only by additive noise but also by an acoustic transfer function. The method realizes an improved user interface such that a user is not encumbered by microphone equipment in noisy and reverberant environments. The use of HMM composition has already been proposed for co...
متن کاملImproved HMM Separation for Distant-Talking Speech Recognition
In distant-talking speech recognition, the recognition accuracy is seriously degraded by reverberation and environmental noise. A robust speech recognition technique in such environments, HMM separation and composition, has been described in [1]. HMM separation estimates the model parameters of the acoustic transfer function using adaptation data uttered from an unknown position in noisy and re...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل